Combating against Web Spam through Content Features

نویسندگان

  • Muhammad Iqbal
  • Malik Muneeb Abid
چکیده

Web spamming refers to use of unethical search engine optimization practices to gain better position on Search Engine Result Page (SERP). Making judgment on web-page to declare it as spam or ham is complicated issue because different search engines have different standards. Link-based spamming, cloaking and content spamming is main focus of different anti spam techniques. Even though these anti-spam techniques have had much success, however, these techniques still face problems when combating against a new kind of spamming techniques. This paper presents a usage of different machine learning methods which provides a solution for supervised classification problem. We have used WEBSPAM-UK-2007 public data set and in our experiments. The final results are compared and analyzed with well known classifiers. The results show that Jrip and J48 perform well compared to other two methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Machine Learning in Combating Web Spam

High ranking of a Web site in search engines can be directly correlated to high revenues nowadays. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain undeservedly high position in search results. Web spam remarkably deteriorates the information quality availa...

متن کامل

Web Spam Detection

Definition Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its linkbased score), and cloakin...

متن کامل

Feature Selection-model-based Content Analysis for Combating Web Spam

With the increasing growth of Internet and World Wide Web, information retrieval (IR) has attracted much attention in recent years. Quick, accurate and quality information mining is the core concern of successful search companies. Likewise, spammers try to manipulate IR system to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the spamming techniques of adversari...

متن کامل

Fighting Web Spam

High ranking of a Web site in search engines can be directly correlated to high revenues. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain an undeservedly high position in search results. Web spam remarkably deteriorates the information quality available on...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015